Impressions of convexity - An illustration for commutator bounds 



David Wenzel 

Fakultdt fiir Mathematik, TU Chemnitz 
09107 Chemnitz, Germany 

Kocnraad M.R. Audcnaert 

Mathematics Department, 
Royal HoUoway, University of London, 
Egham TW20 OEX, United Kingdom 



Abstract 

We determine the sharpest constant Cp^q^r such that for all complex matrices X and Y, and for Schatten 
p-, q- and r- norms the inequality 

\\XY-YX\\p<Cp^g^r\\XmY\\r 
is valid. The main theoretical tool in our investigations is complex interpolation theory. 
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1. Introduction 

In this paper we determine the sharpest constant Cp^q^r such that for all complex matrices X and Y the 
inequality 

\\XY -YX\\p < Cp,q,r\\X\\q\\Y\\r (1) 

is valid. Here, all norms are Schatten norms, i.e. 

with (Tj the decreasingly ordered singular values cri > . . . > ad > oi X . 

This question is a straightforward continuation of a line of investigation about analogous inequalities con- 
sidered previously with special choices for the norm indices p, q and r. For instance, in Q one of us raised 
the conjecture that in the case q = p one has 

^ 2^raay:{l/p,l — l /p,l — l/r} 

We want to show the validity of this conjecture and carry over the developed ideas to the general situation. 
We will also take a closer look at the cases of equality in ^ , studied previously for p = g = r = 2 in Q . 

The main technique used in this paper is complex interpolation a la Riesz-Thorin, applied in a rather 
intricate way to the problem at hand. To achieve optimal clarity, the exposition will partially leave the 
usual format, with two effects. While certain steps in the proofs later turn out to be redundant, we have 
chosen to keep them in because of their use in the development of the complete proof and their importance 
in obtaining a better understanding of what is going on behind the scenes. Secondly, some parts arc not 
following the usual structure and should be understood as a written presentation that will guide the reader 
through our thoughts. 

Email addresses: david.wenzel@s2000.tu-chemnitz.de (David Wenzel), koenraad.audenaertSrhul.ac.uk (Koenraad 
M.R. Audenaert) 
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1.1. Notations 

We will use some abbreviations in formulas: the Lie bracket {X, Y] — XY — YX for the commutator, 
a{X) for the vector of singular values of X, Tr X for its trace, X'^ for its transpose and X* for its adjoint. 
Moreover, O will denote a zero matrix of appropriate size, /„ a n x n identity matrix and A(BB ~ Diag(j4, B) 
will be written for the construction of block diagonal matrices. For any norm index p S [1, oo], p' denotes 
the conjugate index of p, i.e. the number p' G [1, oo] satisfying ^ + ^ = 1. As is well-known, the Schatten 
p' norm is the dual norm of the Schatten p norm. Note that we took the formal equality \\X\\p = \\a(X)\\p 
as sufficient reason for denoting the usual £p norm of a vector also by |j • \\p. 

1 . 2. Illustrations 

Throughout the paper our proofs will be of a very pictorial nature, because there are so many special cases 
to be considered, and it so happens that these cases can be presented graphically in a very clear way. We 
hope that this will allow the reader to gain a better understanding of the several steps and at the same time 
quickly obtain an overall view of the whole proof. 

As stated before, the topic of this paper is finding the best constant Cp^q^r in (HJ, where p, q and r are 
norm indices, 1 < p, g, r. The triplet of values (p, q, r) can be depicted as a point in M"^, or more precisely in 
[1, oo]'^. The proofs of our theorems require subdividing this infinite cube in several regions, and rather than 
just define these regions in the usual way (with equalities and inequalities), we will augment every definition 
with a graphical illustration, of points and regions in M.^ or (when we restrict to the case p = q), where 
every real axis corresponds to one of these norm indices. In addition we'll use these pictures to display many 
other quantities that are important in the proofs, but that will become clear later on. 

Of course, we need some device to portray the whole real line or even only the semi-bounded interval [1, oo] 
in a finite space. So we need to cheat a little bit and we will distort reality by mapping norm indices 
p € [1, oo] to positions in the image given by the reciprocal of the conjugate index ^. 
Applying this mapping 

Img : [1, oo] M, p^ 1 

P 

in illustrations has several advantages (see Figure [T]). Firstly, we obtain finite pictures as [l,oo] is mapped 
onto [0, 1]. Moreover, the unreachably far away index p = oo becomes the handy point Img oo = 1. The 
mapping preserves the order of the norm indices, i.e. 

p < q => Img p < Img q. 

So, we are given just an appropriate scaling and the smallest possible index p = 1 is of course the left-most 
point in the images. Last but not least, the index p = 2 is mapped exactly to the middle of the line segment, 
befitting its special role as the only self-conjugate index. 

As the first object of interest ^ involves two norm indices p and r we are going to use two-dimensional 
images by applying the scaling function twice independently: 

Img^ : [1, oo] X [1, oo] ^ M'^, {p, r) t-^ (Img p, Img r). 

The result is a finite square whose center corresponds to the well known special case p = r = 2 that was 
proved in Q. There are some other nice side effects. The points satisfying r = p still form a straight line 




Figure 1: The scaling of norm indices for ID imaging purposes. 
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Figure 2: The curves r 



2 3 

= p (green) and r 



1 2 OO 

= p' (blue) in the original and the 2D scaled setting. 



in the graphics. Moreover, the curve r — p' — {1 — 1/p) ^ is mapped to the square's other diagonal (see 
Figured]). 

Later on, when we study ^ in full generality, we will use this same image scaling to three dimensional 
pictures: 

Img^ : [l,oo]^ R^, {p,q,r) h-> (Img p, Img Img r). 

There are again several curves that have lines as images. Furthermore, we will encounter some surfaces that 
are conveniently mapped to planes. 

1.3. Basics on norm interpolation 

We want to briefly introduce a concept that is a key to our proofs and will be used extensively in the 
remainder of the paper. More detailed explanations and additional applications can be found in [8|. 

In 1926 M. Riesz established a theorem that allows to interpolate between two inequalities involving the 
usual ip vector norms. Stated in our notations; 

Theorem 1 (Riesz-Thorin). Let I < Po < pi < oo and 1 < qo < qi < oo be given such that 

qi < pi and q2 < P2- (3) 

If for a linear operator 

T ■.R'' ^ M" (4) 

there are Mq^Mi > such that 

\\Tx\\po < ^■^o||a;||go a^c? \\Tx\\p^ < Mi\\x\\q^ (5) 
for all arguments x, then for any 9 G [0, 1] and every vector x the inequality 

\\Tx\[p < M^-Om',\\x\\, (6) 
holds with p e [po,Pi],q G ['ZOi'Zi] defined by 

11-99 ,11-96 

- = \ and - = \ . (7) 

p Po Pi q qo qi 

The theorem was enshrined in the fundamental methods of analysis, when Riesz' student G.O. Thorin 
extended the theorem to complex arguments and operators, obtaining an analogon of Theorem [T] with ^ 
replaced by 

T ■.Cf' ^ C". 

His proof, based on an ingenious use of Hadamard's three line lemma from the theory of analytic functions, 
reveals the surprising fact that the condition ([3]) is no longer necessary in the complex case (essentially 



because condition ([5]) must now hold for all complex vectors); an assertion that is completely wrong in the 
real case! 

Afterwards, the result was extended to operators T defined on subspaccs and. by help of density arguments, 
to operators acting on infinite-dimensional spaces, in particular the L^-spaces. Moreover, it was shown that, 
if X and Tx are matrices, the underlying norms may be replaced by their Schatten type analogues. This 
holds due to a general equivalence between sequence spaces and the corresponding Schatten classes as far 
as interpolation is concerned [l[ . 

Recently, in [l^l, one of us restated the theorem in terms of a special structure, the tensor product of 
argument vectors, with the purpose of investigating ([T|) with p = q = r. Although formulated in a more 
specific way in that paper, its wider validity was noted. Indeed, one can replace (|4]) by 

T ■ c'-^^'^'^ ^ — > C™^" 

and substitute 

x^X(g)Y 

with matrices X G C*^-' , F £ C*^^'. That is, we are given a linear operator on the whole set of matrices, but 
only apply it to arguments that are tensor products (also called Kroneckcr product for matrices). 
The proof is an adaption of Thorin's proof as presented in Q , combined with the fact that the generated sim- 
ple functions (actually vectors in the finite-dimensional case) respect the tensor structure of the arguments. 
As for the original theorem, X and Y may be taken from subspaces of C'^-' or C'^^', respectively. 

In the aftermath of the WATIE 2009 conference we learned about the multilinear version of the Ricsz-Thorin 
theorem. In the multilinear case ([4]) is replaced by the multilinear operator 

T : C'=i X • •• X C'=" ^ C", 

^ and © by inequalities like 

||T(x(i\...,xM)||^^<M,||x-W||^a,...||x(™)||^<„., 

and (O then consists of m inequalities for fixing g'^-') Q. 

Closer inspection revealed that the statement is actually equivalent to the usual interpolation but applied 
to tensor products, owing to the property \\X Y\\p = |jX||p||y|jp of Schatten norms. Later on, we will see 
that the multilinear interpretation is too comprehensive for our needs, whereas the original interpolation 
theorem and its diagonal extension via tensor products serve their purpose very well. Be sure to read the 
acknowledgement for some more insights. 

Our scaling function (Section II. 2p is especially convenient for picturing certain salient aspects related to 
norm interpolation. The Riesz-Thorin theorem, in particular (O, tells us that in terms of reciprocals, the 
interpolated index i is a convex combination of the base indices — and — . This is the reason why the 
Riesz-Thorin theorem is sometimes called a convexity theorem. 

If the norms of argument and target space are different (i.e. pi ^ g^), we need to consider a joint convex 
combination of the index reciprocals. Due to the way Img is defined in the images of this paper Img^(p, q) 
conveniently lies on a straight line between Img^(po5(?o) and Img ^ (pi , gi ) . 

We call the points {pi,qi) interpolation base points and all points (p, q) subject to (0 interpolants. We carry 
over this nomenclature to the associated inequalities and to the images of the points in our pictures. 
Note that for real interpolation both base points are necessarily located in the lower triangle determined by 
the main diagonal q = p (Figure [21). 

1.4-- Overview 

For the sake of clarity, in Section[21 we start with treating the original and simpler conjectured inequality 
about the constant Cp^p_r- Since only two parameters enter the treatment, the pictures are 2-dimensional. In 
Section [3l the approach used in Section [2] is generalised to treat as much of the general 3-parameter problem 
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as possible. In the course of this process, we wih encounter a number of parameter regions that could not. 
as yet, be treated using the interpolation methods applied in Section[21 To overcome this hurdle, two things 
are needed. Firstly, the value of Cp^q^r in certain extremal points of parameter space must be established. 
This is done in Section |4] using a combination of basic linear algebra methods and esoteric knowledge about 
certain magical symbols. Secondly, the remaining areas of parameter space have to be covered, and this is 
done in Section [4. 31 using more advanced versions of Riesz-Thorin interpolation. Thus, the proof of our main 
theorem is finished at that point. We hasten to add that for certain areas in parameter space the C'p^q^r 
constant depends on the dimension d of the matrices. Furthermore, in some instances the interpolation 
method did not yield the sharpest possible bound. In Section [5j the cases of equality are considered, and 
we wrap up with a conclusion (Section |6]) and a list of recommended readings. 

2. The original conjecture and its proof 

This section is dedicated to the derivation of which is the following theorem. 
Theorem 2. With the notations of equation (Qp, 

Cp,p,^ = max 2i-i/f , 2^-^/"-^ . 

This is the original conjecture stated in The proof we give here is somewhat longer than what could 
have been, but in this way it clearly demonstrates the power and applicability of interpolation. Near the 
end of this section, the reader will notice that the proof may be shortened a bit. 

2.1. The claim and some special situations 

To begin with, we ensure that the value claimed for Cp^p^r can be attained. For this, take a look at the 
examples 

0)' 

\ ) , (9) 
giving the value 2^^^. Hence, the constant Cp^p^r cannot be smaller than asserted. 



yielding the quotients 'l||^~||y^|l'' ^ 2^-^/p as weU as "^^fij^"" = 2i"i/'' and 




Figure 3: Left: Illustrating interpolation base points {pj,qj) and interpolants (p, q) inbetween; Right: Possible choices for base 
points (green) and obtained interpolants as well as points yielding no statement (red) for the real case. 
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Because of the appearance of a maximum, over the three terms as stated in Theorem [21 the set of all pairs 
(p, r) of norm indices is naturally subdivided into three segments: 

l<p<2 A r <p' 
2 < p < OD A r < p 
r > p A r > p 

That this is an equivalent statement is easily verified analytically and is illustrated in Figure 21 
The conjecture is already known to be true in some special cases, namely 

• p = r = 2; this case is the origin of the investigations and was shown in full generality in Q; 

• p = r Cz [l,oo], proven in [loj ; 

• p ~ 2,r Cz [1, oo], proven in 
The conjecture holds trivially for 

• p=l, 

as \\XY\\i < ||X||i||r||oo < and the triangle inequality \\XY - YX\\i < \\XY\\i + \\YX\\i 

together with © give Cis,r = 2; 

• r = oo, 

since also < ||X||p||l^||oo holds and (jS]) realizes equality; 

• p = oo, 

because of equally simple conclusions. 
These pairs (p, r) and their corresponding constants are depicted in Figure |5l 

For all 2D images depicting (p, r) of Theorem |2l (as in the left of Figures HI and [5]) we will subsequently omit 
axis labels to avoid unnecessary information overflow. 

2.2. A re-interpretation of known cases 

First we reconsider the case p = 2, r G [1, oo], but from a different point of view. The validity was obtained 
by one of us as a consequence of an even stronger inequality [ij. We want to deduce the value of C2,2,r in a 
different way, show-casing the two major techniques (complex interpolation and norm index monotonicity, 
see below) we will repeatedly use in the rest of the paper. 

We will also demonstrate the strong link between the promised pictures and the associated argumentation 
and formulas. 





= 2'^P 


Cp^p^r 


= 21-i/P 


^ p.p.r 





We know the values of 6*2,2,2 and C2.2,oo from the inequalities 

||[X,r]||2< V2||X||2||r||2 and ||[X,y]||2 <2||X|l2| 



for all d X d matrices X and Y. The respective pairs of parameter values (2, 2) and (2, 00) 
are represented by the green points in the picture at the left. 



Now fix an arbitrary X with ||^|j2 = 1 and consider the commutator as a linear operator 



••dxd 



~>dxd 



,Y ^ XY - YX. 



Clearly, we have 



||i^x(mi2 < V2||y||2 and \\KxiY)\\2<2\\Y\ 



for any Y. As these correspond to the premises (O of the Riesz-Thorin theorem, in its usual form, (see 
Theorem [l] and the comments on generalization thereafter) we endeavour to apply this theorem for p = 



1 2 cx) 

Figure 4: The three segments where the constant Cp,p,r takes on different values according to Theorem [2] and the reference to 
examples achieving equality (left), and the graph of Cp^p^r as a function of {p,r) (right). 



00 ■ 




oo 



Figure 5: Known special cases for Cp^p^r- red: p = r = 2, yellow: p = r S [1, oo], blue: p = 2, r € [1, oo] and the trivial cases in 
green. 
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2, r S (2, oo) (the points on the orange hne in the picture). For this we require the vahdity of ([7]), which is, 
in our case: for any 9 £ (0, 1) 



1 - 



+ 



and 



1 - 



2 2 2 r 2 

As the first cquahty is triviahy true we have that the parameter 

2 

r 



1 



Consequently, we obtain inequality 



is in one-to-one correspondence to all possible interpolants (2,r) 
that is 

\\Kx(Y)h<V2' 2^!|r||. 

or equivalently 

\\XY -YXh <2'-'/'-\\X\\2\\Y\\r 
and hence 6*2,2,?- < 2^^^/'', as required. Note that assuming X to be normalised incurs no loss of generality. 



For the remaining case r e [1, 2) we can use a simpler concept, which we would like to call 
norm index monotonicity, or just monotonicity for short. By this we mean the well-known 
relation 

ll^llp < ll^llg for any p>q 

and arbitrary matrices A. This procedure could be regarded as an interpolation with only 
one base. 

In this manner we obtain directly from the knowledge of 6*2,2.2 = \/2 that 

\\XY-YX\\2 < \/2i|x||2l|ri|2 < V2||x||2i|r|i,, 

which gives C*2,2,r < V2 for p < 2 (points on the yellow line). 

For both cases, r > 2 and 1 < r < 2, the proof is now easily completed by providing an example of two 
matrices that achieve equality, as we have already done in Section [53] 

We will keep on the arrangement for picturing known base points green and indicating an interpolation 
process by an orange line and the use of the monotonicity argument by a yellow line. 

2.3. Towards a full proof 

In this section we give an intuitive overview of the proof of Theorem [21 but on the other hand also provide 
the necessary details for more demanding readers. To accommodate both audiences we have adopted an 
unusual style that may be called a scientific graphic novel. In an attempt to avoid boring the reader too 
much, the level of detail will be reduced in due course when coming across cases that arc similar to already 
covered ones. 

Roughly speaking, the proof can be subdivided in four parts, each part corresponding to one of the four 
quadrants of the parameter space: the lower left quadrant, corresponding to p,r < 2, the lower right, p > 2, 
r < 2, the upper right p, r > 2, and the upper left quadrant p < 2, r > 2. We begin with the lower left 
quadrant. 

The conjecture can easily be shown to be true for p,r < 2 by ordinary Riesz-Thorin 
interpolation. For this fix r S [1,2] arbitrarily. 

We obtain two points on the green lines in the picture, for which we have 



||[x,r]||i <2||x||ii|r||, and ||[x,y]||2<\/2!|.Y||2||r||.. 

Regard the commutator as a map Ky{X) ~ [X, Y] with some fixed Y with = 1. So, 

\\Ky{X)\\i<2\\X\\i and ||i^y(X)||2 < \/2||X||2. 



As the norm indices of original and target space coincide for both inequahties, we need to satisfy 

1 _l-0 6 
p ~ 1 ^ 2 

twice. Hence, 6 = 2 — 2/p and from the Riesz-Thorin theorem we immediately get Cp_p_r < 2^~^-\/2 = 2^^p. 

I Interpolation also works in the case p > 2,r < 2. Again, fix r G [1, 2]. Here, we have 

\\[X,Y]h<V2\\Xh\\Y\\r and \\[X,Y]\\oo < 2\\X\\^\\Y\\r.. 
Interpolation of Ky now requires 

1 _ 1 - 61 9 
p 2 oo 

which amounts to 6* = 1 — 2/p and yields Cp^p,r < V2 2^ = 2^-'^lv _ 

Note that applicability of Theorem [T] comes from fixing the variable Y (for given norm index r) or X (for 
given p, as in the last subsection) , which is expressed by a vertical or horizontal line in our graphics of norm 
index pairs {p,r). We remark that jointly interpolating p and r for bivariate inequalities is not supported 
by the original theorem. Hence, slanted (non-horizontal, non-vertical) lines for interpolation are forbidden 
here. 

I I I T I Now, having covered half of the proof, interpolation will not work for 

p > 2,r G {2,Qo) as it did in the previous cases. Regardless whether 

we interpolate Ky (left) or Kx (right), i.e. fixing r or p, the obtained 

bound always gives a larger value than the one claimed in Theorem [31 
2i-2/rp > max{2i-i/P,2i-i/''}. See Figure E] for an illustration of the 
1 I 1 difference. 

I I I I I Analogously, for p < 2,r > 2, interpolation does not yield the desired 

bound either, but gives the larger value of 2^^^/''+^/''?'. 

This actually was to be expected, since interpolation produces smooth 

bounds, while the claimed constant is not smooth as function of p and r. 
To wit, the graph of the constant exhibits cusps at the lines r = p and 

1 I 1 r = p' , lines that are intersected by the current directions of interpolation; 

for this reason we call these lines cusp lines. 

Nevertheless, we can obtain sharp values for Cp^p^r in a more complicated, two-step interpolation process, 
combining one of the more advanced versions of the Riesz-Thorin method with its ordinary version, and 
carefully choosing the right step at the right time. 

7 The value of Cp^p^r along one of the cusp lines, namely the main diagonal r = p, can be 
y/^ obtained with help of the tensor structure interpolation mentioned in Section [1.31 between 
Ci^i^i, 6*2,2,2 and Coo,oo,oo- By applying the usual interpolation statements on these special 
arguments, interpolation along diagonal lines becomes possible. This has already been 
done in 0, with the resuh Cp,p,p = 2^-^/p for p > 2 (and Cp,p,p = 2^/p for p < 2). 

1 Note that for p < 2 this result has also been obtained in the above, but in a single step, 

using ordinary Riesz-Thorin interpolation. This shows that the more complicated approach does not always 
lead to sharper bounds. 
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1.4, 




Figure 6: Claimed values (blue) for Cp^p^r and interpolation bounds (green) for the lines r = p in the upper right quadrant of 
parameter space (left) and r = p' in the upper left quadrant (right) 



7 Instead of interpolating over the whole upper right quadrant we can now do this in a 
triangle only, and get sharp values, exhibiting the cusp at the diagonal. Fix r G (2,oo). 
/ We have 

\\[X,Y]\\r < 2'-'/-\\X\\r\\Y\\r and ||[X,r]||oo < 2||X||oo!|r||, 

and consider Ky- By 
11-9 9 

p r oo 

and consequently 9=1 — r/p we get Cp.p.^ < (2^"^/'')^ ^ 2^ = 2^^^/^, as claimed. 

Similarly, for the second triangle, fix p G {2,oo). We know the bounds for the two base 
points: 

||[^,>^]||p < 2'-'/'P\\X\\p\\Y\\p and \\[X,Y]\\p < 2||Xi|p||y||oo. 
Now, by interpreting the commutator as the linear map Kx we obtain 

1 _ 1 - 6* 9 
r p oo 

oi9 = l-p/r and Cp,p,^ < {2^-^/'i'Y''^ 2" = 2^-^!^ ^ 

We ought to remark that it is easier to interpolate the two triangles along 
the other respective direction (i.e. fixing the other variable as we have 
done above), as the constant in the corresponding inequalities is the same 
for both interpolation base points and, hence, automatically yields exactly 
this value for all interpolant inequalities. We don't even need to determine 
the relation linking the parameter 9 with p or r. Note that this only works 
because some of the base points were determined before by other interpolation steps, whereas the variant 
given first relies only on the trivial estimates on the boundary of the square. 
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In the last, upper left quadrant, another cusp line appears. In order to proceed in a 
similar way as we did with the upper right quadrant, the anti-diagonal r = p' is needed. 
Fortunately we can obtain these values by a simple duality argument, as follows. For p > 2 
we have 

\\[X,Y]\\,<2'-'^P\\XmY\\, 
obtained from tensor product interpolation (green line). Then for any Y with \\Y\\p ~ 1 one has 



We conclude 



9I-1/P 



sup 

X 



91/p' 



sup 

X 



\KYm\\v' 



sup 

X 



giving < 2"^/^ ||X||p' for p' < 2, which is the assertion for the anti-diagonal (red line); see 

the proof of Proposition |4] for details on the above equality. 

It should be clear now that in complete analogy to the upper right quadrant 
we have to interpolate the two triangles separately. Again it doesn't matter 
along which direction we fix one of the norm indices. 



Of course, one of the directions is easier than the other. We leave it to the 
reader to find out which one is preferable. 

We skip the details as the result does also follow from a plain duality 
argument (as we have done for the anti-diagonal). By this argument we 




directly get Cp^p^r = Cp 



which may be interpreted as a reflection 



symmetry of the constant about the line p = 2. 



2.4- A little short-cut 




The case p > 2,r < p and similarly p < 2,r < p' can be done in an 
even easier way using norm index monotonicity. By this, the value 2^^^/^ 
(known for points on the green line in the left picture) extends to all 
1 < r < p. So, after the diagonal interpolation (as the really first step), 
this attempt may replace the previous investigations of the lower right 
quadrant and one triangle. Similarly, once we obtain the anti-diagonal by 
duality, the observations for the lower left quadrant follow automatically. 

Also note that the remaining two single triangles can be merged into a single step. This 
cannot be done with help of the norm index monotonicity, and interpolation (with fixed 
r) becomes necessary. However, since both base inequalities admit the same constant, 
this turns out to be pretty easy. 

Finally, we remark that the diagonal tensor interpolation, the dual anti-diagonal values 
and the triangle interpolation between both can also be merged into a single step by 
applying the multilinear extension of the Riesz-Thorin theorem. However, we will not 
give more details about this since the treatment of ([7|) requires the synchronization of 
three equalities and the calculation of the value of the interpolated bound is no longer 
that easy. 
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3. Generalisation 



In the previous section we iiave proven Tiieorem[21 wliicli is really a special case of inequality ([1]). In the 
present section we want to try our two main tools, as well as some slightly more delicate things, to see how 
much extra mileage they allow us on the road towards a full proof of inequality ([T]). In that sense, this 
section is really a continuation of Section [21 The main result of these investigations can be found at the end 
of this section. 




For the general situation ([T]) three norm indices p, q and r have to be depicted, 
requiring three-dimensional images. In what follows, we transform the cube 
[1,00]^ by the mapping Img and represent its image using a perspective 
projection from a fixed viewing direction. Under these circumstances we can 
again drop axis labels, just as in Section [21 Note that (2,2,2) is again repre- 
sented by the cube's center (red point). 



00 



As of now, regions of parameter space will be colored differently depending on the rule that determines 

a 



p,q,r ■ 




First, we picture the (now proven) originally conjectured special 
case in the general context. We know the values of 

Cp,p,^ = max {2I/P, 2I-1/P, 2I-1/''} 

and by swapping the roles of X and Y also of 



Cr, 



.|2i/p^2^"^/*',2^"^/«| . 




These constants are represented by triplets on the planes q = p and r = p. Due to 
the properties of our scaling function Img ^ the latter are indeed planes (recall similar 
statements for Img^ given in Section [L2)) . 

We combine the two results and moreover modify them in a way that turns out to be 
more suitable for what follows. Naturally, one has = and 1}-'^I'p = 2^- V'' 

in the two planes, respectively. 



3.1. Monotonicity conquers (almost) all 

The validity of the conjecture naturally extends to some of the cases with q ^ p,hy applying the norm index 
monotonicity argument. 
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\ 






First take a look at triplets connected to the constant 2^"^/''. This 
value does not depend on p and q. So, we choose some point (r, q, r) 
on the pictured segment in the left image. We obtain, for all p > r, 

\\[X,Y]\\p<\\[X,Y]\\r 

<2i-i/n|x||,||y||, 

for arbitrary matrices X and Y . Moreover, for points {p,p, r) as in 
the right image we get 

\[X,Y]\\,<2'-'/^\\X\\,\\Y\\r<2'-'/^\\X\U\\Y\\r 



for any q < p. 





end of this section 



For triplets belonging to 2^"^/*? we may argue in an analogous way 
for any p > q (left) and also for any r < p (right). 
As we only obtain upper bounds, we also need an example achieving 
equality. One such is given by Note that matrices of rank one 
are essential for achieving equality in monotonicity relations. We 
will treat this in more detail later. 

Similarly, for the segment where the constant is 2^/^, which is in- 
dependent of q and r, one can extend the bound to q < p (left) 
and r < p (right). Taking into account © we see that the value is 
sharp in these areas. 

Summing up the results obtained so far, we get that the constant of Theorem[2]is valid 
also for a huge part of the general setting, namely for all (p, q, r) with <? < p or r < p, 
but not with all of p > 2, g < 2 and r < 2. Here we have one more indication that the 
areas (like the processes) are a lot easier to visualize than to capture in formulas. 
We point out the reflection symmetry of the areas and their values. The light blue area 
is the image of the dark blue area under reflection about the plane q ~ r, and the pink 
area is symmetric about that plane. One can even check that the value of Cp^q^r equals 
the value in its mirror point. This symmetry originates from the symmetry of C under 
interchanging both X with Y , and r with q, as will be discussed in more detail at the 
(Proposition 14]) . 



3.2. Some more sophisticated techniques 




For the next steps we need the values for points (1, g, q'). These can be obtained by a 
Holder-type inequality which is true for Schatten norms 

whence, combined with the triangle inequality, one has — < 2||X||g||F||g' 

giving Ci,q^q' < 2. Example ^ shows that equality can be achieved. 
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Now take any point (1, q, q') from the hnc we just observed and apply the monotonicity 
tool once more. We get Ci^q^r < 2 for all r < q' . 

Example ([9]) again achieves equality here, and the whole triangle admits the value 2. 
The points in the triangle then serve as base points for the next interpolation step. 



Choose q and r arbitrarily in the grey triangle. We are going to 
interpolate only p between 1 and r (if g > r, left picture) or 1 and 
g (if r > q, right picture). For example, for the first case one has 
for dZD 

1 1-61 9 



which yields 



p,q,r 



p 



< 2 i-i/-- 



1 



2i/, 



= 2i/P. 



Note that it doesn't matter whether we interpolate Kx or Ky, as both q and r are fixed. 

For the second case we may proceed in an analogous way, or alternatively rely on the 
g-r-symmetry already mentioned at the end of Section IXTl 

Also note that the area connected to 2^^^ is now of the same shape as the areas of 
21-1/'? g^j^j 2i-i ' . 

By this, we obtain two more symmetry planes, which are investigated in detail in 
Proposition m and which ensure the symmetry of the values and not only of the area's 
shape. 





Our next aim is to close the mould formed by the three areas for which the constant is 
known so far. First we interpolate between the two points (1, g, q') and (g, q, oo). Both 
of them admit the constant 2, hence the points inbetween all share this value. The 
only remaining task is to determine which are the points inbetween. Since q is fixed, 
simple interpolation will work and requires 

1 1-61 61 ,1 1-61 9 

— = \ — and - = 1 . 

p 1 q r q' CO 

Combining the latter we obtain the value 2 for all points satisfying 

11 1 
p q r' 

We remark that Img ^ maps the set of these triplets to a planar triangle. After having done this calculation 
one gets an impression of the difficulties involved in the multilinear version, when three equations come into 
play. 
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Now wc arc in a position to close the gap between the plane that 
we have treated and the known bodies, by means of interpolation. 
For this, again fix q and r as only p will vary. Now choose p such 
that i = ^ + 7- Hence, {p,q,r) lies in the brown triangle. The 
appropriate base point in the light blue triangle is then given by 
{q,q,r) (left image). For interpolants {p,q,r) we need to satisfy 

1 _ 1 - 61 
p p q 

which results in the bound 

C < Ol+l/P^l/'?-!/'" 

While this value seems to be rather exotic and maybe even perplexing it is sharp nonetheless, as demonstrated 
by the example 

The second case is again done in a similar way or obtained by the g-r-symmetry. 




3.3. Trouble... 

Knowledge of the constants for the plane ^ = ^ + :^ successfully helped to obtain the values 
in a triangular pyramid. So it is natural to try the same for the pyramid opposite to it. In order to perform 
the interpolation we need the value Ci.oo.oo of the pyramid's top. Unfortunately, this value is no longer 
independent of the matrix size d. Thanks to the well-known inequalities 

\\XY - YX\\i < \\XY\\i + \\YX\\i < 2||X||i||y|U < 2d\\X\\^\\Y\U, (11) 

we find a simple upper bound given by 2d. 

Using techniques similar to those used in the last section, one gets the upper bound 
This follows in three steps: interpolating the line p = 1, r = oo, the plane p ~ I 




and finally the pyramid ->- + -. 

J Ir-J p — q J. 

At this point the interpolation method runs out of steam. Whereas for even d the value is shown to be sharp 
by the example 

X = X®X® Y = YG)Y®... 

with 2x2 matrices X and Y as in pU|) . we are unable to find an example when d is odd. The reason may 
be that in this case the estimate ([TT|) is already not sharp. 
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The last area not yet investigated is the cube given by p > 2 and q,r < 2. We do know 
the value of Cp,q,r for three of its facets, namely \/2. The obvious method to apply is 
monotonicity. For instance, as indicated in the picture we may write 

\\XY-YX\\p < \\XY-YX\\2 < V2\\X\\q\\Y\\r 

for any p > 2. By this, the upper bound \/2 is extended to the whole cube. Of course, 
one can use the monotonicity argument also with reducing q or r based on the other 
facets instead. 



Sadly, this value is not sharp, and we can show this as follows. First we observe that the value ^/2 is obtained 
solely by the knowledge of C2,2.2 — V2, as the values on the facets themselves followed from the value at 
the point (2, 2, 2) using monotonicity. Now, we can use the fact that for pi > p2 equality in \\A\\p^ < \\A\\p2 
holds if and only if rank A = 1. Hence, applying this for all indices, we see that X, Y and XY — YX must 
all be matrices of rank one satisfying the equality \\XY — ^^-^^112 = ■\/2|lX||2||F||2- From Proposition 4.5 of 
we know that without loss of generality two rank one matrices X and Y satisfy this equality only if there 
are vectors a,b such that ||a||2 = ||6||2 = 1,X = ab*,Y = ba* and a*b = 0. However, under those conditions, 
XY — YX = aa* — bb* has rank two, yielding a contradiction. 

3.4- The result, so far 

The previous steps obtained in this section (that is, the positive ones) add up to the following theorem. 
Theorem 3. For (p, q, r) with < + , excluding the octant p > 2,q < 2 and r < 2, one has 

Cp^q^r = max{2i/P, 2^"^/", 2^-^l'\ 2i+i/P-i/9-i/'-}. 

The jour segments of Cp^q^r corresponding to each of the four arguments of the maximum function are given 
as follows: 




2^'P when 

q < p' , r < p' , 
r < q' and 
P<2; 



21-1/'' when 

r > p' , q < r, 
q < p and 
r > 2; 



21-1/9 when 

q>p',r<q, 
\ I f < P (ind 
«>2; 



2l + l/p-l/9-l/r 

when 

1 < 1 + 1, 

Q > P, r > p 
and r > p' . 
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For d X d matrices of even size and {p, q, r) with ^ > ^ + ^ one 
has 

C. — 9^i/p-i/'/-i/'" 
If d is odd the latter is only an upper hound. 



Note that the constant for parameters in the region - < - -\- - (i.e. the first four cases of Theorem [3]) are 
independant of dimension. Hence, the statement is a£o true in the infinite-dimensional setting of Schatten 
norms. 

The following result summarises all the symmetries we have encountered and also encapsulates the duality 
arguments mentioned at the end of Section 12.31 



Proposition 4. For any {p,q,r) G [1,c>d]'^ one has 



Co 



Cri 



Cry 



Ca 



''p.q.r — ^p,r^q: ^p.q.r — ^r' .q^p' ^ ^p,q,r — ^q'.p'.r- 

These three equalities represent the reflection symmetries of Cp^q^r about the planes q = r, r = p' , and 
q = p' , respectively: 






The third picture generalises the duality statement from Section [ 

Proof. The first equality is a mere consequence of y]||p = and the resulting possibility of 

changing the roles of X and Y. 

Now for the second equality, observe that for any fixed X with \\X\\q ~ 1 



sup 

Y 



\Kx{Y)\\p 
\\Y\L 



sup sup \{Kx{Y),W)\ 



sup sup \{Y,K*x{W))\^snp- 
\\w\L,=i\\Y\W=i w 



K*xiW)\\r' 

llW^llp' 



and Kx 



-Kx imply the assertion. Here, {A, B) = B* A denotes the inner product associated with the 
Schatten classes. The third equality is analogous or can be proved by combining the first two equalities. □ 

The representations of norms as given in the last proof are called variational characterisations and they will 
be of extraordinary use in the following section, too. 



4. Extremal points 

In the previous section we have squeezed the last drop out of the interpolation, monotonicity and duality 
methods, but two areas in parameter space, a tetrahedron and a cube, still resist treatment. In the present 
section we finally tackle these recalcitrant areas by finding the value of the constant in two specific points. 
To do so, some new ideas are needed. 
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4-.1. The skeleton 




Figure 7: Visualization of {p,q,r) for which Cp^q,r could be determined by means of interpolation and monotonicity from the 
values of a couple of points (green). The next logical targets are represented by red points. 

In Figure [7| we depict all constellations {p,q,r) covered so far in Sections [2] and [3] by marking them in grey. 
All the values of the constant in these triplets were the result of the knowledge of its value in only four 
points (or three, using symmetry), namely (2, 2, 2), (oo, oo, oo) and (1, 1, oo) or (1, oo, 1) (marked green). We 
also relied on the values of the points on the orange lines. However, a closer look reveals that these may be 
obtained from interpolation between two of the four green base points, too. 

In Section 13.31 we had a quick glance at the remaining two areas (white) . In one situation, monotonicity 
failed, while in the other the value at the interpolation base point was likely not well-estimated (at least for 
odd-sized matrices). 

In any case, the natural approach for carrying this further is to find the exact value of the constants Ci^oo.cxd 
(d odd) and Ctx3,i,i. These are the triplets marked red in the figure. 

4-.2. The value of the constant at the corners 

In this subsection we provide the value of the constant in the two corners just mentioned. We prove the 
following theorem: 



Theorem 5. For d x d matrices one has 

s^/'^l = dy/2 + 2cos{'K/d) if d is odd. 



^) C\ .00,00 
b) Coo, 1,1 = 



fi|l 
2d 



if d is even; 



27/4. 



Proof of a). 

We only need to prove the formula for odd d, as the value for even d was already shown in Section 13.31 in a 
much easier way. However, as it requires no extra efforts, we nonetheless prove that particular result again 
in the same fashion as for the odd case. 
A variational characterisation for Ci,oo,oo is given by 



maxjllXy 



YX\ 



\X\ 



< IJIFI 



<!}• 



Let us first fix Y. The function to be maximised is convex in X, and the feasible set of X is convex as well, 
with extremal points given by the set of unitary matrices. Thus, we can write: 



max{\\XY ~YX\\i : X unitary ,||r| 



< Il- 



ls 



A similar argument allows to conclude that Y can also be restricted to the set of unitary matrices: 

Ci,cx3,cx) = max 

X,Y unitary 

In addition, the trace norm has a variational characterisation as well: 

||A||i= max |TrZA|. 
Z unitary 

Thus we get a maximisation over three unitary matrices: 

Ci.oo.oo= max \TrZ{XY -YX)\. 
X,Y,Z unitary 

Every unitary matrix is unitarily equivalent to a diagonal matrix with all diagonal elements of modulus 1. 
Applying this to Y, we get 

y = C/ Diag (e*^\ e*^^ . . . , e*''" ) f7* . 

The matrix U can be absorbed into X and Z, so that w.l.o.g. we can restrict Y to be of this diagonal form. 
Indeed, 

Tt{ZXY - ZYX) = Tt{ZXULU* - ZULU*X) 

= Tt{ZUU*XULU* - ZULU*XUU*) = Tt{Z'X'L - Z'LX'), 

where Z' = U*ZU and X' = U*XU. 

Then [X, F] can be rewritten as a Hadamard product: XY — YX = Ao X, with A a matrix with entries 
Ajk ~ e'^* — e'^^ . The function to be maximised becomes 

\TtZ{XY-YX)\ = \TiZ{AoX)\ = \J2ZkjA,kXjk\ 

The Cauchy- Schwartz inequality leads to a further upper bound: 
\TtZ{XY-YX)\ < Y.\Zl\\A,u\\X,k\ 

jk 

< \y^\zi\^\A,A \y^\A,^\\x,^\^ 

Applying the maximisation over all unitary X and Z to both sides then yields 

max I Tr Z{XY - YX)\ < max^ \A,,\ \X,k\\ 

Jk 

because both factors of the right-hand side could be maximised separately, and both maxima are equal. Now 
note that the matrix with elements |A^j7cP is a doubly stochastic matrix (because X is unitary). Furthermore, 
the function to be maximised is linear in jX^fep. Hence, the maximum is achieved in extremal points of the 
set of doubly stochastic matrices. By BirkhofF's theorem these are permutation matrices. Thus we have 
a further reduction: 

max|TrZ(Xy-yX)| < max^ |, 

3 
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where the maximum is over all permutations tt. Observe that this inequality is actually an equality, as the 
left-hand side attains the right-hand side for and X both equal to the permutation matrix representing 

TT. 



We are now left with calculating the maximum over all angles 9j and all permutations tt of J^j \^jTr{j) \ = 
|g»fl^(j) _ g«Sj|, This problem has a nice geometric interpretation. The complex numbers e'*^ are points 
on the unit circle. The permutation tt maps every point to another point, in a one-to-one fashion. If we 
draw edges from e**^ to e'^"'^' we obtain one or more polygons (in general, non-convex and self- intersecting) , 
corresponding to the cycles of the permutation. The problem is to distribute the points on the circle and 
choose the polygons so that the total length of the edges (the total circumference of the polygon(s)) is 
maximised. 



For even d, the maximum is easy to find: d/2 points are equal to 1 while the others are —1, and the 
permutation consists of d/2 2-cycles. The maximum length is therefore 2d. See Figure|8]for an illustration. 
The odd case is not that simple. Whereas d = 3 can still easily be seen, larger sizes are more difficult. It 
turns out that the maximal length is obtained when tt is a cyclic permutation (so that we have only one 
polygon) , the points are the d-th roots of unity, and they are connected in the shape of a star polygon with 
Schlafli-symbol {d; {{d— l)/2)} (see Figure [HI)- The upshot is that there are d edges, and every edge has the 
same length |1 - g('i-i)"/d| = |1 + e"/''!. 

We will prove this in two steps. First we calculate the maximal circumference L(n) of a single ?7-polygon 
(corresponding to tt being cyclic). Second, we show that L{n) is superadditive: L{J2i ni) > J2i L(ni). That 
is, the total circumference does not increase by using a permutation tt consisting of several, shorter cycles. 

We first maximise L{n) — |e'^'"<^' — e'^^ | over all angles 9j for tt a cyclic permutation. As we can 

relabel the angles, it does not matter which cyclic permutation we take. It therefore suffices to maximise 
Ej=i |e'^'~' -e'^'l> with6lo := e,^, which is equal to |1 - e*(^^-^'-^^|. Define =9^-0^-1 (mod 27r), 

so that < Xj < 27r. We can now replace the maximisation over the angles by a maximisation over their 
differences Xj, with the condition that should be an integer multiple of 27r (because of the cyclicity 

of tt). Noting also that |1 — e"| = ^2 — 2 cosx, which in turn is equal to 2sin(a;/2) over the interval 
< X < 27r, we then have the constrained maximisation 



L{n) ~ max < 2 sin(a:;j/2) : Xj = 2fc7r 

fceN,0<fc<n I j 

From the concavity of the sine function over the interval [0,7r], we get 

sin(fc7r/n). 



^ sin(a^j/2) < sin ^ Xj 



with equality if all Xj arc equal. Therefore, the maximisation over the Xj is readily done, and we get 

Lin)— max 2nsin(/c7r/n). 

feGN,0<fe<n 

The remaining maximisation over k is also easy: for even n, L{n) = 2n (with k = n/2), while for odd n we 

get the smaller value 

Hn) = 2nsin((n — l)7r/2n) = n\/2 + 2 cos(7r/n). 



It remains to prove superadditivity of L{n), i.e. 

L{n) > L{n- k) + L{k). 
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Figure 8: Distribution of d points on the unit circle forming polygon(s) with maximal total circumference; trivial solution for 
even d (left) and obvious configuration for d = 3 (right). 




For even n, this is simple: either k and k are both even, in which case L{n — fc) + L{k) = 2{n — k) + 2k = 
2n = L{n), or they are both odd, in which case L{n — k) + L(k) < 2{n — k) + 2k = 2n ^ L{n). 
For odd n, consider k odd and n — k even, so that L{ji — fc) = 2{n — k). We note that L{n) — 2n for odd 
n is an increasing function of n. Hence for odd k and n, L{n) — 2n > L{k) — 2k, which directly implies 
L{n) > L{n — fc) + L{k). This ends the proof of a). 

Proof of b). A variational characterisation of Coo.i.i is 

Coo,i,i =max{||xy-yx||oo : ||X||i, ||y||i < 1}. 

Now note that || XY — YX ||oo is convex in X, and that the set of X such that ||X||i < 1 is a convex set 
with extremal points the rank 1 matrices X = uv* , where u and v are normalised vectors. Thus, 

max{||Xr-yX|U : ||^||i < 1} 

is achieved for X of the form X = uv* . 

Similarly, the latter is a convex function in Y (the pointwise maximum of two convex functions is again 
convex) and, therefore, is also maximal for Y of the form Y = ab*, where a and b are normalised vectors. 
Hence, 

Coo.1,1 = max \\uv*ab* ^ ab*uv* Woo- 

' u.v,a,h 

The norm itself also has a variational expression: 

PI loo = max|p*A(7|, 

where p and q are also normalised vectors. We thus end up with a maximisation over 6 normalised vectors: 

C'00,1.1 = max \p* {uv*ab* — ab*uv*)q\ 

= max \{p,u){v,a){b,q) - {p,a){b,u){v,q)\. 

It is in principle possible to perform this maximisation over each of the 6 vectors in turn, but the calculations 
immediately become very long-winded. A much better approach is to focus attention to the inner products 
directly. 

W.l.o.g. we can restrict the values of all inner products to be real, which can be done simply by considering 
real vectors only. It is easily seen that \{p, u)(v, a) (6, q) — (p, a) (6, u)(v, q) \ cannot be made bigger by allowing 
complex valued inner products. Thus, let (p, w) = cos a, (w,a) = cos/3 and {b,q) = cos 7, and {p,a) = cos (5, 
(6, u) = cosrj and {v,q) = cos 9. The point to observe now is that of these angles exactly 5 can be chosen 
independently, while the remaining one is then subject to an inequality, as illustrated here: 

P S a V , 1 

V i — > a i — > p i — > U i — > b i — > q. 

In this example, 0, the angle between v and q, is restricted to be less than the sum of all other angles (which 
is not a restriction if that sum is larger than tt). Thus we get 

(^00,1.1= max {| cosa cos/3 C0S7 — cos(5 cos r/ cos6'| : < < a + /3 + 7 + (5 + 7]}. 



Lemma 6. 



and 



max cosacos/3 = cos^(a;/2). 

a,/3:a-\-/3—x 



min cosacos/3 = — sin^(a;/2). 
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Proof of Lemma 



max cos a cos /3 — maxcosQ;cos(a; — a) 



= (cosa;)/2 + maxcos(a; — 2a)/2 
= (cosx + l)/2 = cos2(x/2). 

The minimum is given by (cosx — l)/2 = — sin^(x/2). 

Lemma 7. For — tt < x < tt, 



□ 



ForO<x< 2tt, 



max cosacos/3cos7 = cos(a;/3)'^ 

CK,/3,7:CK+/3+7— a; 



min cosacos/3cos7 — — cos((a; — 7r)/3)" 

a,/3,7:a+/?+7— a: 



The maximal and minimal values outside these intervals are obtained by periodical extension. 
Proof of Lemma By applying Lemma El we get 



max cos a cos f3 cos 7 

a,/9,7:a+/3+7— a; 



max cos 7 max cos a cos /3 

y.-y-.y+l^x a,l3:a+P=y 

max cos7Cos(y/2)^ 



= maxcos(a; — y) cos(y/2) . 

■y 

The stationary points of cos(.t — y) cos(y/2)^ as function of y are y = tt and y = 2(.t + fc7r)/3, yielding the 
values and cos((.t + fc7r)/3)^ cos[(.t — 2fc7r)/3]. The maximum of these values is cos(a;/3)^ for — tt < x < tt, 
while the maximum outside this interval is obtained by periodical extension. The minimum is calculated in 
a similar way. □ 

With this lemma we are thus led to replace the maximisation over the 6 angles by a single maximisation: 
we maximise the first term over angles a, 7 subject to a + l3 + j = x, and minimise the second term over 
angles 6,7], 9 subject to 6 — S — rj — x (as the sign of 6 and 77 is irrelevant in cos 5 cos 77 cos 9 we can use Lemma 
[7] here too). This leads to 



max cos (x/3) + cos ((x — 7r)/3). 

0<x<7r 



The maximum is achieved for x = ir/2 and equal to v27/4. This ends the proof of b). 



□ 



4-3. Interpolation revisited 

The major hope behind Theorem [S] is of course that we might be able to use interpolation to close the two 
gaps for the unknown triplets {p,q,r). 

For p>2,q,r<2 (the cube in the lower right of the illustrations) interpolation turns 
out to be the wrong method, at least with the present data. We demonstrate this for 
the line q = r = \ {p> 2). Even the exact value of Coo, 1,1 does not force interpolation 
bounds to be sharp in this case. We obtain 
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1-2/p 



(12) 



However, as in the proof of Theorem [5] b) one can show that for these points the X and Y achieving the 
maximum are matrices of rank one. Hence, XY — YX has at most two non-zero singular values. Combining 
the knowledge of 
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yofTof < \/2 and 
CT2 < 0-1 < \/27/4 

already yields a better upper estimate than in fact 



C'2,1,1 

Coo, 1,1 




(13) 



Recalling the example ^ given in Section [53] we may ensure 

Alas, this is a worse lower bound for p 00 as it tends to 1 < 
The trickier example of two (normed) rank one matrices from 



(14) 



27/4 « 1.229. 

page 1880] gives us the curve 



r- VScosf/- sinfj) 

cri,o-2) = v2-— — —— 

1 + 2 cos ffl sm ( 



(cos 0, sin 0) with (/)e[0,7r/4] 



(15) 



for possible singular values of XY — YX. 

By choosing a point on the curve (jl5[) with p-norm as large as possible we obtain a very good lower bound 
to Cp, 1,1, which is numerically approximated in Figure fTOl Moreover, we conjecture that the resulting value 
is equal to the constant Cp.1.1 for p > 2. The estimates given by the upper and lower bounds (also pictured 
in the figure) are already very tight. 

Note that Coc,q,i and Coo.i,r can be determined by duality from Cp,i.i. Recall the symmetries of Proposition 
|4]for that purpose. 



Similarly, for odd-sized dx d matrices in the upper left pyramid there seems to be one 
more plane of cusps determined by the example used for Ci,oo,oo in Theorem [5] a) 



X = 



O 

hd/2i 



l\d/2] 

o 



( 



,Y = 



yielding the value 
Cp 

on the one hand, as well as the value 



> di/Pv/2 + 2cos(7r/d) 



Co 



> 2(d- 1) 



i/p 



(16) 
(17) 



on the other hand, given by padding an example matrix of even size d — 1 with a zero line and column 



X 



I 1 
1 



V 



1 

1 



/ 1 



.Y 



; 



V 



/ 



In the picture we indicated the areas of the pyramid where the examples yielding (jl6p or (|17p represent 
the largest known lower bounds. Assuming that these two values are equal to Cp, 00, cxs and that the two 
examples achieve the pyramid's values, it is left as a simple exercise in interpolation to show that the interface 
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00 



Figure 10; Estimates for Cp.i,i: upper bounds I I12II (red) and I I13I I (blue) and lower bounds I I14II (green) and JTSj (black). 




Figure 11: Estimates for C'p,oc,ac- upper bound 1181 1 (solid) and lower bounds I I16II (dashed) and I I17I I (dotted) for d = 3 (left) 
and d = 5 (right). 
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boundary surface is really mapped to a plane by Img ^. The second example is indeed a direct sum of the 
2x2 matrices ([rU]) from Section 13.31 padded with an additional row and column of zeroes to get an odd 
dimension; recall that for even dimension these matrices did achieve equality. 
The interpolation bound 

Cp,oo,oo < (rfv/2 + 2cos(Vrf)) ■ 2'"'/^ (18) 

is likely not sharp. 

From FigurefTTIwe are tempted to conjecture that the difference between the bounds ^TE\\ and max{ ([T51) . (ITT)) } 
vanishes as d — >■ (X). Moreover, the index po for which (jl6p and (|17l) coincide seems to tend to infinity when 
the size d is increased. 



5. Maximality 

In after giving the first general proof for C2,2.2 = y^, the notion of maximality was introduced. This 
notion was subsequently extended to p-maximality in [lO[. Consistent with these definitions we want to 
investigate the maximality problem in the general context and call a pair (X, Y) oi d x d matrices {p, q, r)- 
maximal if both X and Y are non-zero and satisfy ^ with equality, i.e. 

\\XY ^YX\\p ^ Cp,,^r\\XmY\\r. 
In contrast to [l^], we are only looking here at the Schatten norms. 

A characterization of (2, 2^2) -maximality, which is called maximality in Q and Schatten 2-maximality in 
[lo| , was recently given in . This result will serve as a basis for further investigations to derive criteria for 
maximality in the {p, g, r) case, in combination with the tools we have used in Sections [2] and [3] to obtain 
the exact values of the bound Cp^q^r- 

First of all, we will see that the method of monotonicity imposes strong restrictions. 
Lemma 8. 

a) If Cp_q_r = C!p,q,r was obtained by monotonicity via increasing p < p and {X,Y) is {j>,q^r)- maximal 
then Rank(Xy — Y X) = 1 and {X, Y) is {p, q, r) -maximal. 

b) If Cp^q^r — Cp^q^r was obtained by monotonicity via decreasing q > q and {X,Y) is (j),q,r)- maximal 
then RankX = 1 and {X, Y) is (p, g, r) -maximal. An analogous statement is true for r and Y . 

Proof. The monotonicity argument in a) works as follows: 

WXY - YXWf, \\XY~YX\\p 



\\X\\q\\Y\\r - \\X\\q\\Y 

Hence, if {X, Y) is (p, q, r)-maximal, the left-hand side equals Cp_q^r and this implies the (p, q, r)-maximality 
of the pair since all of the inequalities in the chain become equalities. Moreover, we get 

\\XY -YX\\p=\\XY -YX\\p 

for p > p which is only possible if the corresponding matrix has rank one. 

The proof of b) is similar. □ 

In [lo| we argued that some properties are preserved by interpolation. Furthermore, we used the fact that 
especially a rank one structure is left untouched. Of course, this argument only works if the obtained 
interpolation bounds are sharp. 

Lemma 9. Let 1 < p,q,r < oo. 
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a) If Cp,q,r is obtained by interpolation connected to a base, for which all matrices X of a maximal pair 
{X,Y) admit rank one, then also the matrices X of a {j),q,r)- maximal pair {X,Y) must have rank 
one. Similar statements hold for Y and XY — YX . 

b) If Cp^q^r is obtained by interpolation between any point and (2,2,2) (directly or via several steps) and 
{X,Y) is {p,q,r) -maximal then {X,Y) is (2, 2, 2) -maximal. 

c) If Cp^q^r is obtained by interpolation connected to a base, for which all matrices X of a maximal pair 

[X,Y) are unitarily similar to matrices of the type ( ^ ) ® O with \xi2 \ = \x2i\, then also all 



X21 

matrices X of a (p, q,r) -maximal pair (X,Y) have this property. Similar statements hold for Y and 
XY - YX. 

Proof. The key point in the proofs is that if {X, Y) is a maximal pair with respect to an interpolated triplet, 
then an appropriately modified pair {X, Y) is maximal with respect to the base point triplet. 
An analysis in [13, proof of Proposition 8] showed that the matrix X is actually a scaled version of X in 
the sense that every entry (i.e. a complex number) keeps its complex argument, but has its absolute value 
raised to a specific power (one of us calls this operation a polar power; see 0]). More precisely, if Xjk = re"^ 
then Xjk = r-^e"^. 

Clearly, an entry with the value is not altered in any way by this procedure. Moreover, the claim of a) 
was already proven true and applied in [l^ based on these ideas. 

Now, for any interpolation connected to the base point (2, 2, 2) or any other point that has been obtained 
by such a process, the scaled pair needs to be maximal in the original sense. This statement is true if the 
interpolation process is the usual Riesz-Thorin theorem (complex version) or the tensor argument extension, 
since the tensor structure is unharmed by the scaling procedure. 
By Theorems 3.1 and 3.2 of all these pairs arc given by: 

UXU* = e O, UYU* = Yo®0 

with Xo,yo e C2><2 and 

= TtXo^TtYq = Ti-Y*Xo- 

The only information that we had obtained in Q about matrices of a maximal pair was that they should 
have rank at most two, which was not enough to obtain meaningful restrictions for interpolants in [lol |. 
But with the simultaneous unitary similarity to essentially 2x2 matrices, it is now easy to see that with 
Tt Xq = also Tr Xq = is given. The last conclusion is only possible as the trace is now the sum of only 
two entries, or equivalcntly we have the relation 

TtXo ^ <^ ill = -X22 

which is kept by scaling the modulus back to X . 

For transferring the orthogonality of Xq and Iq to Xq and Yq we furthermore need the well-known statement 
that a trace zero matrix is unitarily similar to a matrix whose diagonal elements are all zero (0, p. 77). 

Hence, without loss of generality we may assume Xq = [ ^ ) , implying for the scalar product 
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TryQ*Xo = = Xi2yi2 + i2i2;2i- 

Of course, the latter is also preserved by scaling, since there are again only two summands in which exactly 
one component of X and one of Y appear as factors. Since {X, Y) obeys the same relations as (X, Y) specified 
above and the theorems in 0] yield necessary and sufficient conditions, we obtain the (2, 2, 2)-maximality of 
the pair. 

The claim of c) can be shown in a similar but even simpler fashion. □ 
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Remark. The part c) in Lemma IH] is a generalization of a) since every rank one matrix with trace zero is 

unitarily similar to a matrix of the type ^ ^ ^ ® C The zero trace will automatically be given in 

combination with b). Note that for the described type of matrices X one has cf{X) = (c, c, 0, ...) for some 
c > 0. Hence, X is unitarily similar to a multiple of a unitary 2x2 matrix that is padded with zeros. 
As all matrices of maximal pairs connected to b) admit rank not greater than two, we are able to apply the 
strong estimate 

Pllp<PII,<2i/«-i/^'||A||p Vp>g 

for Schatten norms. In general, the constant 2 in the second inequality would have been the rank or even 
the size d. Such estimates were crucial in [lo| to determine (1, 1, l)-maximal pairs and will also be of use in 
the following. 

Both Lemmas imply that maximal pairs can only be found in a very limited range. We will see that only 
the boundary of the parameter space may need a separate treatment, but will mostly fit with the results 
of the interior. Lemma |9]b) implies that moreover (2, 2, 2)-maximality can be expected to be richer than 
others (excluding possibly cases like (oo, oo, oo) at the boundary). Before proceeding with the consequences 
of these two results we need to introduce a new drawing convention we'll adhere to. 

Up to now we had no problems to picture sets of points (p, r), as all of them were closed sets, i.e. points, 
lines with end-points or complete bodies containing all of its bounding facets. However, for visualizing areas 
connected to shared properties of maximality we will encounter open sets. In order to visualize them in a 
comprehensible way we only draw lines and points instead of colored facets, and in the following way: 

This marks all points on the line except the right end. 

This picture marks all points of the triangle, excluding the grey edge at the right. The 
grey line itself contains its end-points. 

In three-dimensional space this marks the complete body enclosed by the facets of the 
same colour and their neighbours. If a line or facet is colored differently it is excluded. 
For instance, the image on the left marks, in black, the whole cube except the back 
facet and its boundaries. 



The oval marks the interior of a facet, i.e. excluding the grey boundary. 

Dotted lines mark the interior of the three-dimensional body, i.e. excluding its surface 
(facets, edges and vertices). 
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Theorem 10. For (j),q,r) in the respective areas, one has: A pair {X,Y) with Z = XY — YX is {p,q,r)- 
maximal if and only if there exist a unitary U G i^^d'xd Xo,Yq,Zo G C^^'^ with Tr Xq = Trig = = 
Tr(yQ*Xo) such that 

UXU* =Xo®0, UYU* =Yo®0, UZU* = ZqOO 

and moreover: 
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1) 

2a) 
2c) 

Jc 

4b) 



1) Xq^Yq are arbitrary otherwise, 

2a) RankXo = 1, 
2b) RankFo = 1, 
2c) RankZo = 1, 



3a) RankXo = RankFo = 1, 
3b) RankFo = RankZo = 1, 
3c) RankXo = RankZo = 1, 




4a) Xq is a non-zero multiple of a unitary matrix, 
4b) Yq is a non-zero multiple of a unitary matrix, 
4c) Zq is a non-zero multiple of a unitary matrix, 

5) Xq,Yq and are all a non-trivial multiples of unitary matrices. 

A look at Thcorcm|3]should make clear why we won't describe the regions of parameter space by (in) equalities 
at this point. 

Proof. 



The origin of the investigations is 1) and has been proven in 0] as Theorems 3.1 and 
3.2. 

The parts 2) are results of Lemma [5] For this recall the construction of the values 
in these areas by monotonicity in Section l3.ll For instance, in the case 2a) q was 
decreased. As a consequence X must be a rank one matrix. 

The parts 3) are similarly easy. But, beginning with a triangle from 2), a second 
monotonicity along another direction is applicable. 

As pictured for 3a) we decrease r, yielding RankF = 1 additionally. Observe that, 
except for p = 1, Lemma [5] and Lemma [5] (for transferring to q = 1 and r = 1) grant 
the similarity to 2 x 2 matrices. 

For the last segment of this area remember that we closed the gap by interpolation 
with one base point in the triangle eX p = 1. 
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These base points can be handled by monotonicity with two different directions, grant- 
ing RankX = RankK = 1. The similarity relation is now simply verified directly. 
Then by Lemma [5] all interpolants inherit this property. 

The other two parts 3b) and 3c) may be proven in a similar fashion or can be shown 
by a symmetry argument. Take a look at the proof of Proposition 0] and observe that 
properties of maximal pairs are indeed swapped between X, Y and XY~YX as stated. 



A small hint for direct use with 3c): 

In one half of the area it is easy to see that Rank X = 1 and Rank Z = 1 with help of 
monotonicity. 

In the blue area we know for sure the (2, 2, 2)-maximality. Hence, 

aiY) = (yi, 2/2,0,...), a{Z) = (zi, Z2, 0, ...) 
and because of RankX = 1 (due to monotonicity), one may assume for maximal pairs 

2l-l/r 



< 



yielding \\Z\\r = \\Z\\2 or equivalently RankZ = 1. At 
this point the rank-specific estimates from the last remark 
came into play. 

The brown line inherits the properties of the blue area thanks to monotonicity, by Lemma El For the red 
line, a simple calculation gives 

ll^iii < 2||xi|i||r||oo = 2||xi|,||y|U = il^lloc, 

which also results in RankZ = 1. In this case, RankX = 1 was again the result of Lemma [S] 

Now, having for the facet two rank one matrices in any maximal pair, this is also true for the second half of 

the 3c) area by Lemma El 

For 4c), i.e. the triplets {p,q,q'), check due to the (g, (jf')-maximality and subsequently the (2,2,2)- 
maximality in one half of the triangle that Z has only two non-zero singular values. Hence, we can write 



= 



< 



2i/p-i/9||2'|| 



yielding 



\Z\\p^2'/P-'^^\\Z\\g 



which results in (t{Z) = (c, c, 0, ...) for some c > as claimed. In the second half of the triangle we have 
(q', q, g')-maximality and the conclusions are analogous. 
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\ The area for 5) can be handled by interpolation. However, here we will do this in a 

I \ different way than in Section |21 by interpolating two indices simultaneously {p and r 

\ \ or p and q) and keeping the other index fixed. Here the ordinary interpolation (for Kx 

; or Ky) already suffices, in line with what had already been observed in Section [531 

; The picture illustrates one such process for fixed X and q. By this process, the unitarity 

\ properties of Yq and Zq are inherited by the interpolants, too. 



Another interpolation process (fixing Y and r) and the first one complement each other in a fitting manner, 
determining also the properties of Xq and Zq. In the end, since every point of the interior is an interpolant 
with respect to two interpolations, all three matrices are of the asserted type. □ 

We remark that a property of 3) automatically implies the respective property oi 4) for the complementing 
variable, e.g. 3a) ^ 4c), but the converse is not true. Notice that one bounding facet is not covered by the 
methods of the proof. 

6. Conclusions 

In accordance with the title we have chosen for this paper, we want to point out several occasions at which 
the 'river of convexity' crossed our way. First of all, we perused a specialized convexity theorem in the form 
of Riesz-Thorin interpolation in Sections [5] and [31 We have seen that in many cases this theorem, in its usual 
form, does an excellent job. Even for the bilinear operator called commutator it becomes applicable by fixing 
variables. One major issue could be efficiently solved by applying this theorem to some unusual structures. 
By interpolating along lines that (taken together) build up planes we establish new bases for subsequent 
interpolation steps. In summary, these axis-oriented processes are able to cover even more complicated 
regions of parameter space and may give strong estimates. 

Furthermore, we demonstrated that it is possible to illustrate a bunch of (in)equalities and descriptive 
processes in an intuitive way. We hope the reader enjoyed using this graphical tool rather than having to 
comb through a vast array of formulas. 

In Section 21 we encountered convexity multiple times. We have seen convex functions (and their concave 
counter-part), convex sets and properties related to both of them with regard to extremal points. Here, we 
also have drawn connections to a visually appealing geometrical problem. 
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